cdk.lf, moe.lf, bci.lf: Functions to parse lines from fingerprint files

Description

These functions take a single line and parses it to produce a vector of integers which represents the position of the 'on' bits in a fingerprint. This allows the user to use read.fp with arbitrary fingerprint files. A new file format can be handled by defining a new line parser function. Currently the first three functions process fingerprint files obtained from the CDK (http://cdk.sourceforge.net), MOE (http://chemcomp.com), BCI (http://www.digitalchemistry.co.uk/) and the FPS format (http://code.google.com/p/chem-fingerprints/wiki/FPS). The last function can be used for any fingerprint that generates hashed features (such as ECFPs or other circular fingerprints). For these cases, it is assumed that features are unsigned integers, so string features are not handled.

Note that when the fps.lf function is specified, items such as the number of bits or the header flag do not need to be specified, as the format requires a header block containing some of these items.

Usage

cdk.lf(line)
    moe.lf(line)
    bci.lf(line)
    ecfp.lf(line)
    fps.lf(line)
    jchem.binary.lf(line)

Arguments

line

The line to parse

Value

A list with three componenents - the name associated with the fingerprint (if available) and a vector of integers representing bits set to 1 (for the case of the first three methods) or a vector of characters representing hashed features (characteristic of circular fingerprints) or more generally, any string feature. The third component is a (possibly empty) list, which contains the remaining components of a line, when the format allows items other than an a title and the fingerprint (such as the FPS format). The content of the third component is dependent on the line function that is being used.